Tree-based Fitted Q-iteration for Multi-Objective Markov Decision Processes in Water Resources Management
نویسندگان
چکیده
Multi–Objective Markov Decision Processes (MOMDPs) provide an effective modeling framework for multi–objective decision-making problems involving water resources systems. The traditional approach to solve these problems is to consider many single–objective problems (resulting from different combinations of the original problem objectives), each one solved using standard optimization techniques. This paper presents a new approach to MOMDPs based on batch–mode reinforcement– learning (RL) that enables to learn the operating policies for all the linear combinations of weights assigned to the objectives in a single training process. The key idea is to enlarge the continuous approximation of the action–value function, which is performed by single–objective RL algorithms over the state–action space, also to the weight space. The batch–mode nature of the algorithm makes it possible to enrich the training data without further interaction with the controlled system. The approach is first demonstrated by application to a numerical Test case study of a two–objective reservoir and then evaluated on a real world case study concerning the optimal operation of the Hoa Binh water reservoir in Vietnam. Experimental results on the Test case show that the proposed approach (named MOFQI) becomes com-
منابع مشابه
Multi-Objective Markov Decision Processes for Data-Driven Decision Support
We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted-Q iteration for multiple o...
متن کاملOnline Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes
This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the...
متن کاملOptimizing Spoken Dialogue Management from Data Corpora with Fitted Value Iteration
In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...
متن کاملOptimizing spoken dialogue management with fitted value iteration
In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کامل